Overview

Dataset statistics

Number of variables15
Number of observations428
Missing cells2
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory50.3 KiB
Average record size in memory120.3 B

Variable types

Categorical7
Numeric8

Alerts

Model has a high cardinality: 425 distinct values High cardinality
MSRP has a high cardinality: 410 distinct values High cardinality
Invoice has a high cardinality: 425 distinct values High cardinality
EngineSize is highly correlated with Cylinders and 6 other fieldsHigh correlation
Cylinders is highly correlated with EngineSize and 6 other fieldsHigh correlation
Horsepower is highly correlated with EngineSize and 5 other fieldsHigh correlation
MPG_City is highly correlated with EngineSize and 6 other fieldsHigh correlation
MPG_Highway is highly correlated with EngineSize and 5 other fieldsHigh correlation
Weight is highly correlated with EngineSize and 6 other fieldsHigh correlation
Wheelbase is highly correlated with EngineSize and 6 other fieldsHigh correlation
Length is highly correlated with EngineSize and 4 other fieldsHigh correlation
EngineSize is highly correlated with Cylinders and 6 other fieldsHigh correlation
Cylinders is highly correlated with EngineSize and 6 other fieldsHigh correlation
Horsepower is highly correlated with EngineSize and 4 other fieldsHigh correlation
MPG_City is highly correlated with EngineSize and 6 other fieldsHigh correlation
MPG_Highway is highly correlated with EngineSize and 5 other fieldsHigh correlation
Weight is highly correlated with EngineSize and 6 other fieldsHigh correlation
Wheelbase is highly correlated with EngineSize and 5 other fieldsHigh correlation
Length is highly correlated with EngineSize and 4 other fieldsHigh correlation
EngineSize is highly correlated with Cylinders and 5 other fieldsHigh correlation
Cylinders is highly correlated with EngineSize and 5 other fieldsHigh correlation
Horsepower is highly correlated with EngineSize and 4 other fieldsHigh correlation
MPG_City is highly correlated with EngineSize and 4 other fieldsHigh correlation
MPG_Highway is highly correlated with EngineSize and 4 other fieldsHigh correlation
Weight is highly correlated with EngineSize and 6 other fieldsHigh correlation
Wheelbase is highly correlated with EngineSize and 3 other fieldsHigh correlation
Length is highly correlated with Weight and 1 other fieldsHigh correlation
DriveTrain is highly correlated with MakeHigh correlation
Make is highly correlated with DriveTrain and 1 other fieldsHigh correlation
Origin is highly correlated with MakeHigh correlation
Make is highly correlated with Origin and 9 other fieldsHigh correlation
Type is highly correlated with DriveTrain and 3 other fieldsHigh correlation
Origin is highly correlated with Make and 2 other fieldsHigh correlation
DriveTrain is highly correlated with Make and 6 other fieldsHigh correlation
EngineSize is highly correlated with Make and 9 other fieldsHigh correlation
Cylinders is highly correlated with Make and 6 other fieldsHigh correlation
Horsepower is highly correlated with Make and 8 other fieldsHigh correlation
MPG_City is highly correlated with Make and 8 other fieldsHigh correlation
MPG_Highway is highly correlated with Make and 7 other fieldsHigh correlation
Weight is highly correlated with Make and 7 other fieldsHigh correlation
Wheelbase is highly correlated with Make and 7 other fieldsHigh correlation
Length is highly correlated with Make and 4 other fieldsHigh correlation
Model is uniformly distributed Uniform
MSRP is uniformly distributed Uniform
Invoice is uniformly distributed Uniform

Reproduction

Analysis started2022-06-02 03:52:26.987585
Analysis finished2022-06-02 03:52:41.106126
Duration14.12 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Make
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct38
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
Toyota
 
28
Chevrolet
 
27
Mercedes-Benz
 
26
Ford
 
23
BMW
 
20
Other values (33)
304 

Length

Max length13
Median length9
Mean length6.46728972
Min length3

Characters and Unicode

Total characters2768
Distinct characters46
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st rowAcura
2nd rowAcura
3rd rowAcura
4th rowAcura
5th rowAcura

Common Values

ValueCountFrequency (%)
Toyota28
 
6.5%
Chevrolet27
 
6.3%
Mercedes-Benz26
 
6.1%
Ford23
 
5.4%
BMW20
 
4.7%
Audi19
 
4.4%
Honda17
 
4.0%
Nissan17
 
4.0%
Volkswagen15
 
3.5%
Chrysler15
 
3.5%
Other values (28)221
51.6%

Length

2022-06-02T11:52:41.233209image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
toyota28
 
6.5%
chevrolet27
 
6.3%
mercedes-benz26
 
6.0%
ford23
 
5.3%
bmw20
 
4.6%
audi19
 
4.4%
honda17
 
3.9%
nissan17
 
3.9%
chrysler15
 
3.5%
volkswagen15
 
3.5%
Other values (29)224
52.0%

Most occurring characters

ValueCountFrequency (%)
e241
 
8.7%
a216
 
7.8%
o210
 
7.6%
r173
 
6.2%
i172
 
6.2%
n145
 
5.2%
u143
 
5.2%
s139
 
5.0%
d135
 
4.9%
l100
 
3.6%
Other values (36)1094
39.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2220
80.2%
Uppercase Letter519
 
18.8%
Dash Punctuation26
 
0.9%
Space Separator3
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e241
10.9%
a216
 
9.7%
o210
 
9.5%
r173
 
7.8%
i172
 
7.7%
n145
 
6.5%
u143
 
6.4%
s139
 
6.3%
d135
 
6.1%
l100
 
4.5%
Other values (14)546
24.6%
Uppercase Letter
ValueCountFrequency (%)
M89
17.1%
C58
11.2%
B55
10.6%
S36
 
6.9%
H30
 
5.8%
T28
 
5.4%
V27
 
5.2%
A26
 
5.0%
L23
 
4.4%
F23
 
4.4%
Other values (10)124
23.9%
Dash Punctuation
ValueCountFrequency (%)
-26
100.0%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2739
99.0%
Common29
 
1.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e241
 
8.8%
a216
 
7.9%
o210
 
7.7%
r173
 
6.3%
i172
 
6.3%
n145
 
5.3%
u143
 
5.2%
s139
 
5.1%
d135
 
4.9%
l100
 
3.7%
Other values (34)1065
38.9%
Common
ValueCountFrequency (%)
-26
89.7%
3
 
10.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e241
 
8.7%
a216
 
7.8%
o210
 
7.6%
r173
 
6.2%
i172
 
6.2%
n145
 
5.2%
u143
 
5.2%
s139
 
5.0%
d135
 
4.9%
l100
 
3.6%
Other values (36)1094
39.5%

Model
Categorical

HIGH CARDINALITY
UNIFORM

Distinct425
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
G35 4dr
 
2
C320 4dr
 
2
C240 4dr
 
2
MDX
 
1
Marauder 4dr
 
1
Other values (420)
420 

Length

Max length39
Median length30
Mean length14.51401869
Min length2

Characters and Unicode

Total characters6212
Distinct characters68
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique422 ?
Unique (%)98.6%

Sample

1st rowMDX
2nd rowRSX Type S 2dr
3rd rowTSX 4dr
4th rowTL 4dr
5th row3.5 RL 4dr

Common Values

ValueCountFrequency (%)
G35 4dr2
 
0.5%
C320 4dr2
 
0.5%
C240 4dr2
 
0.5%
MDX1
 
0.2%
Marauder 4dr1
 
0.2%
Lancer OZ Rally 4dr auto1
 
0.2%
Galant ES 2.4L 4dr1
 
0.2%
Lancer LS 4dr1
 
0.2%
Lancer ES 4dr1
 
0.2%
Outlander LS1
 
0.2%
Other values (415)415
97.0%

Length

2022-06-02T11:52:41.414553image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
4dr197
 
15.7%
2dr94
 
7.5%
convertible41
 
3.3%
lx22
 
1.8%
ls21
 
1.7%
v619
 
1.5%
se17
 
1.4%
coupe15
 
1.2%
cab15
 
1.2%
s14
 
1.1%
Other values (421)800
63.7%

Most occurring characters

ValueCountFrequency (%)
827
 
13.3%
r577
 
9.3%
d357
 
5.7%
a327
 
5.3%
e326
 
5.2%
4237
 
3.8%
o228
 
3.7%
t227
 
3.7%
S209
 
3.4%
i205
 
3.3%
Other values (58)2692
43.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3395
54.7%
Uppercase Letter1143
 
18.4%
Space Separator827
 
13.3%
Decimal Number756
 
12.2%
Other Punctuation44
 
0.7%
Dash Punctuation25
 
0.4%
Open Punctuation11
 
0.2%
Close Punctuation11
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r577
17.0%
d357
10.5%
a327
9.6%
e326
9.6%
o228
 
6.7%
t227
 
6.7%
i205
 
6.0%
n194
 
5.7%
c146
 
4.3%
l144
 
4.2%
Other values (16)664
19.6%
Uppercase Letter
ValueCountFrequency (%)
S209
18.3%
L136
11.9%
C100
8.7%
X96
8.4%
T91
8.0%
E73
 
6.4%
G63
 
5.5%
A58
 
5.1%
R50
 
4.4%
M43
 
3.8%
Other values (16)224
19.6%
Decimal Number
ValueCountFrequency (%)
4237
31.3%
2141
18.7%
0117
15.5%
379
 
10.4%
565
 
8.6%
638
 
5.0%
131
 
4.1%
826
 
3.4%
913
 
1.7%
79
 
1.2%
Other Punctuation
ValueCountFrequency (%)
.39
88.6%
/5
 
11.4%
Space Separator
ValueCountFrequency (%)
827
100.0%
Dash Punctuation
ValueCountFrequency (%)
-25
100.0%
Open Punctuation
ValueCountFrequency (%)
(11
100.0%
Close Punctuation
ValueCountFrequency (%)
)11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4538
73.1%
Common1674
 
26.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
r577
 
12.7%
d357
 
7.9%
a327
 
7.2%
e326
 
7.2%
o228
 
5.0%
t227
 
5.0%
S209
 
4.6%
i205
 
4.5%
n194
 
4.3%
c146
 
3.2%
Other values (42)1742
38.4%
Common
ValueCountFrequency (%)
827
49.4%
4237
 
14.2%
2141
 
8.4%
0117
 
7.0%
379
 
4.7%
565
 
3.9%
.39
 
2.3%
638
 
2.3%
131
 
1.9%
826
 
1.6%
Other values (6)74
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII6212
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
827
 
13.3%
r577
 
9.3%
d357
 
5.7%
a327
 
5.3%
e326
 
5.2%
4237
 
3.8%
o228
 
3.7%
t227
 
3.7%
S209
 
3.4%
i205
 
3.3%
Other values (58)2692
43.3%

Type
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
Sedan
262 
SUV
60 
Sports
49 
Wagon
30 
Truck
 
24

Length

Max length6
Median length5
Mean length4.841121495
Min length3

Characters and Unicode

Total characters2072
Distinct characters22
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSUV
2nd rowSedan
3rd rowSedan
4th rowSedan
5th rowSedan

Common Values

ValueCountFrequency (%)
Sedan262
61.2%
SUV60
 
14.0%
Sports49
 
11.4%
Wagon30
 
7.0%
Truck24
 
5.6%
Hybrid3
 
0.7%

Length

2022-06-02T11:52:41.586438image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-02T11:52:41.773945image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
sedan262
61.2%
suv60
 
14.0%
sports49
 
11.4%
wagon30
 
7.0%
truck24
 
5.6%
hybrid3
 
0.7%

Most occurring characters

ValueCountFrequency (%)
S371
17.9%
a292
14.1%
n292
14.1%
d265
12.8%
e262
12.6%
o79
 
3.8%
r76
 
3.7%
U60
 
2.9%
V60
 
2.9%
t49
 
2.4%
Other values (12)266
12.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1524
73.6%
Uppercase Letter548
 
26.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a292
19.2%
n292
19.2%
d265
17.4%
e262
17.2%
o79
 
5.2%
r76
 
5.0%
t49
 
3.2%
s49
 
3.2%
p49
 
3.2%
g30
 
2.0%
Other values (6)81
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
S371
67.7%
U60
 
10.9%
V60
 
10.9%
W30
 
5.5%
T24
 
4.4%
H3
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Latin2072
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S371
17.9%
a292
14.1%
n292
14.1%
d265
12.8%
e262
12.6%
o79
 
3.8%
r76
 
3.7%
U60
 
2.9%
V60
 
2.9%
t49
 
2.4%
Other values (12)266
12.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII2072
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S371
17.9%
a292
14.1%
n292
14.1%
d265
12.8%
e262
12.6%
o79
 
3.8%
r76
 
3.7%
U60
 
2.9%
V60
 
2.9%
t49
 
2.4%
Other values (12)266
12.8%

Origin
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
Asia
158 
USA
147 
Europe
123 

Length

Max length6
Median length4
Mean length4.231308411
Min length3

Characters and Unicode

Total characters1811
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAsia
2nd rowAsia
3rd rowAsia
4th rowAsia
5th rowAsia

Common Values

ValueCountFrequency (%)
Asia158
36.9%
USA147
34.3%
Europe123
28.7%

Length

2022-06-02T11:52:41.947983image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-02T11:52:42.122194image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
asia158
36.9%
usa147
34.3%
europe123
28.7%

Most occurring characters

ValueCountFrequency (%)
A305
16.8%
s158
8.7%
i158
8.7%
a158
8.7%
U147
8.1%
S147
8.1%
E123
6.8%
u123
6.8%
r123
6.8%
o123
6.8%
Other values (2)246
13.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1089
60.1%
Uppercase Letter722
39.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s158
14.5%
i158
14.5%
a158
14.5%
u123
11.3%
r123
11.3%
o123
11.3%
p123
11.3%
e123
11.3%
Uppercase Letter
ValueCountFrequency (%)
A305
42.2%
U147
20.4%
S147
20.4%
E123
17.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1811
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A305
16.8%
s158
8.7%
i158
8.7%
a158
8.7%
U147
8.1%
S147
8.1%
E123
6.8%
u123
6.8%
r123
6.8%
o123
6.8%
Other values (2)246
13.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1811
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A305
16.8%
s158
8.7%
i158
8.7%
a158
8.7%
U147
8.1%
S147
8.1%
E123
6.8%
u123
6.8%
r123
6.8%
o123
6.8%
Other values (2)246
13.6%

DriveTrain
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
Front
226 
Rear
110 
All
92 

Length

Max length5
Median length5
Mean length4.313084112
Min length3

Characters and Unicode

Total characters1846
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAll
2nd rowFront
3rd rowFront
4th rowFront
5th rowFront

Common Values

ValueCountFrequency (%)
Front226
52.8%
Rear110
25.7%
All92
21.5%

Length

2022-06-02T11:52:42.262829image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-02T11:52:42.450531image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
front226
52.8%
rear110
25.7%
all92
21.5%

Most occurring characters

ValueCountFrequency (%)
r336
18.2%
F226
12.2%
o226
12.2%
n226
12.2%
t226
12.2%
l184
10.0%
R110
 
6.0%
e110
 
6.0%
a110
 
6.0%
A92
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1418
76.8%
Uppercase Letter428
 
23.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r336
23.7%
o226
15.9%
n226
15.9%
t226
15.9%
l184
13.0%
e110
 
7.8%
a110
 
7.8%
Uppercase Letter
ValueCountFrequency (%)
F226
52.8%
R110
25.7%
A92
21.5%

Most occurring scripts

ValueCountFrequency (%)
Latin1846
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r336
18.2%
F226
12.2%
o226
12.2%
n226
12.2%
t226
12.2%
l184
10.0%
R110
 
6.0%
e110
 
6.0%
a110
 
6.0%
A92
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1846
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r336
18.2%
F226
12.2%
o226
12.2%
n226
12.2%
t226
12.2%
l184
10.0%
R110
 
6.0%
e110
 
6.0%
a110
 
6.0%
A92
 
5.0%

MSRP
Categorical

HIGH CARDINALITY
UNIFORM

Distinct410
Distinct (%)95.8%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
$33,995
 
2
$21,595
 
2
$29,995
 
2
$28,495
 
2
$23,495
 
2
Other values (405)
418 

Length

Max length8
Median length7
Mean length7.009345794
Min length7

Characters and Unicode

Total characters3000
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique392 ?
Unique (%)91.6%

Sample

1st row$36,945
2nd row$23,820
3rd row$26,990
4th row$33,195
5th row$43,755

Common Values

ValueCountFrequency (%)
$33,9952
 
0.5%
$21,5952
 
0.5%
$29,9952
 
0.5%
$28,4952
 
0.5%
$23,4952
 
0.5%
$21,0552
 
0.5%
$74,9952
 
0.5%
$15,3892
 
0.5%
$23,8952
 
0.5%
$13,2702
 
0.5%
Other values (400)408
95.3%

Length

2022-06-02T11:52:42.586148image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
33,9952
 
0.5%
19,6352
 
0.5%
21,5952
 
0.5%
19,8602
 
0.5%
49,9952
 
0.5%
31,5452
 
0.5%
35,9402
 
0.5%
27,4902
 
0.5%
25,7002
 
0.5%
34,4952
 
0.5%
Other values (400)408
95.3%

Most occurring characters

ValueCountFrequency (%)
$428
14.3%
,428
14.3%
5334
11.1%
0297
9.9%
2281
9.4%
9236
7.9%
3218
7.3%
1206
6.9%
4200
6.7%
6129
 
4.3%
Other values (2)243
8.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2144
71.5%
Currency Symbol428
 
14.3%
Other Punctuation428
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5334
15.6%
0297
13.9%
2281
13.1%
9236
11.0%
3218
10.2%
1206
9.6%
4200
9.3%
6129
 
6.0%
7126
 
5.9%
8117
 
5.5%
Currency Symbol
ValueCountFrequency (%)
$428
100.0%
Other Punctuation
ValueCountFrequency (%)
,428
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
$428
14.3%
,428
14.3%
5334
11.1%
0297
9.9%
2281
9.4%
9236
7.9%
3218
7.3%
1206
6.9%
4200
6.7%
6129
 
4.3%
Other values (2)243
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
$428
14.3%
,428
14.3%
5334
11.1%
0297
9.9%
2281
9.4%
9236
7.9%
3218
7.3%
1206
6.9%
4200
6.7%
6129
 
4.3%
Other values (2)243
8.1%

Invoice
Categorical

HIGH CARDINALITY
UNIFORM

Distinct425
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Memory size3.5 KiB
$19,638
 
2
$68,306
 
2
$14,207
 
2
$33,337
 
1
$28,318
 
1
Other values (420)
420 

Length

Max length8
Median length7
Mean length7.007009346
Min length6

Characters and Unicode

Total characters2999
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique422 ?
Unique (%)98.6%

Sample

1st row$33,337
2nd row$21,761
3rd row$24,647
4th row$30,299
5th row$39,014

Common Values

ValueCountFrequency (%)
$19,6382
 
0.5%
$68,3062
 
0.5%
$14,2072
 
0.5%
$33,3371
 
0.2%
$28,3181
 
0.2%
$17,9571
 
0.2%
$15,7181
 
0.2%
$13,7511
 
0.2%
$17,5691
 
0.2%
$30,7631
 
0.2%
Other values (415)415
97.0%

Length

2022-06-02T11:52:42.758030image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
19,6382
 
0.5%
14,2072
 
0.5%
68,3062
 
0.5%
34,4831
 
0.2%
29,5661
 
0.2%
30,2991
 
0.2%
39,0141
 
0.2%
41,1001
 
0.2%
79,9781
 
0.2%
23,5081
 
0.2%
Other values (415)415
97.0%

Most occurring characters

ValueCountFrequency (%)
$428
14.3%
,428
14.3%
2300
10.0%
1289
9.6%
3251
8.4%
4205
6.8%
8194
6.5%
0187
6.2%
5186
6.2%
7184
6.1%
Other values (2)347
11.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2143
71.5%
Currency Symbol428
 
14.3%
Other Punctuation428
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2300
14.0%
1289
13.5%
3251
11.7%
4205
9.6%
8194
9.1%
0187
8.7%
5186
8.7%
7184
8.6%
6182
8.5%
9165
7.7%
Currency Symbol
ValueCountFrequency (%)
$428
100.0%
Other Punctuation
ValueCountFrequency (%)
,428
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2999
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
$428
14.3%
,428
14.3%
2300
10.0%
1289
9.6%
3251
8.4%
4205
6.8%
8194
6.5%
0187
6.2%
5186
6.2%
7184
6.1%
Other values (2)347
11.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII2999
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
$428
14.3%
,428
14.3%
2300
10.0%
1289
9.6%
3251
8.4%
4205
6.8%
8194
6.5%
0187
6.2%
5186
6.2%
7184
6.1%
Other values (2)347
11.6%

EngineSize
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct43
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.196728972
Minimum1.3
Maximum8.3
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2022-06-02T11:52:42.929913image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1.3
5-th percentile1.7
Q12.375
median3
Q33.9
95-th percentile5.3
Maximum8.3
Range7
Interquartile range (IQR)1.525

Descriptive statistics

Standard deviation1.108594718
Coefficient of variation (CV)0.3467903373
Kurtosis0.5419435378
Mean3.196728972
Median Absolute Deviation (MAD)0.8
Skewness0.7081519825
Sum1368.2
Variance1.22898225
MonotonicityNot monotonic
2022-06-02T11:52:43.130599image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
342
 
9.8%
3.534
 
7.9%
230
 
7.0%
2.526
 
6.1%
2.423
 
5.4%
1.823
 
5.4%
4.621
 
4.9%
4.220
 
4.7%
3.218
 
4.2%
3.817
 
4.0%
Other values (33)174
40.7%
ValueCountFrequency (%)
1.32
 
0.5%
1.41
 
0.2%
1.56
 
1.4%
1.610
 
2.3%
1.74
 
0.9%
1.823
5.4%
1.93
 
0.7%
230
7.0%
2.215
3.5%
2.313
3.0%
ValueCountFrequency (%)
8.31
 
0.2%
6.81
 
0.2%
66
1.4%
5.73
 
0.7%
5.62
 
0.5%
5.53
 
0.7%
5.42
 
0.5%
5.35
1.2%
58
1.9%
4.82
 
0.5%

Cylinders
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)1.6%
Missing2
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean5.807511737
Minimum3
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2022-06-02T11:52:43.302484image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile4
Q14
median6
Q36
95-th percentile8
Maximum12
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.558442633
Coefficient of variation (CV)0.2683494591
Kurtosis0.4403783249
Mean5.807511737
Median Absolute Deviation (MAD)2
Skewness0.5927851991
Sum2474
Variance2.428743441
MonotonicityNot monotonic
2022-06-02T11:52:43.427488image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
6190
44.4%
4136
31.8%
887
20.3%
57
 
1.6%
123
 
0.7%
102
 
0.5%
31
 
0.2%
(Missing)2
 
0.5%
ValueCountFrequency (%)
31
 
0.2%
4136
31.8%
57
 
1.6%
6190
44.4%
887
20.3%
102
 
0.5%
123
 
0.7%
ValueCountFrequency (%)
123
 
0.7%
102
 
0.5%
887
20.3%
6190
44.4%
57
 
1.6%
4136
31.8%
31
 
0.2%

Horsepower
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct110
Distinct (%)25.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean215.885514
Minimum73
Maximum500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2022-06-02T11:52:43.614999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum73
5-th percentile115
Q1165
median210
Q3255
95-th percentile338.25
Maximum500
Range427
Interquartile range (IQR)90

Descriptive statistics

Standard deviation71.83603158
Coefficient of variation (CV)0.3327505873
Kurtosis1.552158629
Mean215.885514
Median Absolute Deviation (MAD)45
Skewness0.9303307363
Sum92399
Variance5160.415434
MonotonicityNot monotonic
2022-06-02T11:52:43.818135image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20017
 
4.0%
21514
 
3.3%
21014
 
3.3%
24013
 
3.0%
22513
 
3.0%
22012
 
2.8%
14012
 
2.8%
30011
 
2.6%
17011
 
2.6%
13010
 
2.3%
Other values (100)301
70.3%
ValueCountFrequency (%)
731
 
0.2%
931
 
0.2%
1001
 
0.2%
1035
1.2%
1043
0.7%
1085
1.2%
1102
 
0.5%
1156
1.4%
1171
 
0.2%
1192
 
0.5%
ValueCountFrequency (%)
5001
 
0.2%
4933
0.7%
4771
 
0.2%
4501
 
0.2%
4201
 
0.2%
3904
0.9%
3502
 
0.5%
3492
 
0.5%
3451
 
0.2%
3406
1.4%

MPG_City
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct28
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.06074766
Minimum10
Maximum60
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2022-06-02T11:52:43.997991image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile14
Q117
median19
Q321.25
95-th percentile29
Maximum60
Range50
Interquartile range (IQR)4.25

Descriptive statistics

Standard deviation5.238217639
Coefficient of variation (CV)0.2611177672
Kurtosis15.79114731
Mean20.06074766
Median Absolute Deviation (MAD)2
Skewness2.782071803
Sum8586
Variance27.43892403
MonotonicityNot monotonic
2022-06-02T11:52:44.170520image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
1869
16.1%
2057
13.3%
1741
9.6%
2138
8.9%
1937
8.6%
1631
7.2%
2422
 
5.1%
2622
 
5.1%
2218
 
4.2%
1517
 
4.0%
Other values (18)76
17.8%
ValueCountFrequency (%)
102
 
0.5%
124
 
0.9%
1312
 
2.8%
1413
 
3.0%
1517
 
4.0%
1631
7.2%
1741
9.6%
1869
16.1%
1937
8.6%
2057
13.3%
ValueCountFrequency (%)
601
 
0.2%
591
 
0.2%
461
 
0.2%
381
 
0.2%
361
 
0.2%
352
 
0.5%
331
 
0.2%
327
1.6%
311
 
0.2%
297
1.6%

MPG_Highway
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct33
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.84345794
Minimum12
Maximum66
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2022-06-02T11:52:44.357643image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum12
5-th percentile18
Q124
median26
Q329
95-th percentile36
Maximum66
Range54
Interquartile range (IQR)5

Descriptive statistics

Standard deviation5.741200717
Coefficient of variation (CV)0.2138770917
Kurtosis6.045610681
Mean26.84345794
Median Absolute Deviation (MAD)3
Skewness1.252395273
Sum11489
Variance32.96138567
MonotonicityNot monotonic
2022-06-02T11:52:44.526970image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
2654
12.6%
2544
 
10.3%
2838
 
8.9%
2934
 
7.9%
2728
 
6.5%
2425
 
5.8%
3024
 
5.6%
2316
 
3.7%
2116
 
3.7%
1916
 
3.7%
Other values (23)133
31.1%
ValueCountFrequency (%)
121
 
0.2%
131
 
0.2%
141
 
0.2%
162
 
0.5%
179
2.1%
1811
2.6%
1916
3.7%
2013
3.0%
2116
3.7%
2213
3.0%
ValueCountFrequency (%)
661
 
0.2%
512
 
0.5%
461
 
0.2%
441
 
0.2%
432
 
0.5%
403
0.7%
391
 
0.2%
383
0.7%
375
1.2%
365
1.2%

Weight
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct348
Distinct (%)81.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3577.953271
Minimum1850
Maximum7190
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2022-06-02T11:52:44.714480image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1850
5-th percentile2513
Q13104
median3474.5
Q33977.75
95-th percentile4995.45
Maximum7190
Range5340
Interquartile range (IQR)873.75

Descriptive statistics

Standard deviation758.9832146
Coefficient of variation (CV)0.2121277605
Kurtosis1.688788526
Mean3577.953271
Median Absolute Deviation (MAD)428
Skewness0.8918242318
Sum1531364
Variance576055.5201
MonotonicityNot monotonic
2022-06-02T11:52:44.901989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31754
 
0.9%
34504
 
0.9%
32854
 
0.9%
34283
 
0.7%
31973
 
0.7%
40523
 
0.7%
38033
 
0.7%
33513
 
0.7%
32173
 
0.7%
25243
 
0.7%
Other values (338)395
92.3%
ValueCountFrequency (%)
18501
0.2%
20351
0.2%
20551
0.2%
20851
0.2%
21951
0.2%
22551
0.2%
22901
0.2%
23391
0.2%
23401
0.2%
23481
0.2%
ValueCountFrequency (%)
71901
0.2%
64001
0.2%
61331
0.2%
59691
0.2%
58791
0.2%
56781
0.2%
55901
0.2%
54641
0.2%
54401
0.2%
54231
0.2%

Wheelbase
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct40
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean108.1542056
Minimum89
Maximum144
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2022-06-02T11:52:45.097027image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum89
5-th percentile95.35
Q1103
median107
Q3112
95-th percentile123
Maximum144
Range55
Interquartile range (IQR)9

Descriptive statistics

Standard deviation8.311812991
Coefficient of variation (CV)0.07685150054
Kurtosis2.133649204
Mean108.1542056
Median Absolute Deviation (MAD)5
Skewness0.9622869732
Sum46290
Variance69.0862352
MonotonicityNot monotonic
2022-06-02T11:52:45.268911image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
10745
 
10.5%
10330
 
7.0%
10627
 
6.3%
11225
 
5.8%
10424
 
5.6%
10521
 
4.9%
11520
 
4.7%
11117
 
4.0%
10917
 
4.0%
10116
 
3.7%
Other values (30)186
43.5%
ValueCountFrequency (%)
892
 
0.5%
939
2.1%
9511
2.6%
965
 
1.2%
973
 
0.7%
9811
2.6%
9911
2.6%
1007
1.6%
10116
3.7%
10216
3.7%
ValueCountFrequency (%)
1442
0.5%
1401
 
0.2%
1371
 
0.2%
1332
0.5%
1311
 
0.2%
1304
0.9%
1292
0.5%
1282
0.5%
1262
0.5%
1243
0.7%

Length
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct67
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean186.3621495
Minimum143
Maximum238
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.5 KiB
2022-06-02T11:52:45.472046image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum143
5-th percentile163
Q1178
median187
Q3194
95-th percentile212
Maximum238
Range95
Interquartile range (IQR)16

Descriptive statistics

Standard deviation14.35799126
Coefficient of variation (CV)0.07704349458
Kurtosis0.6147245054
Mean186.3621495
Median Absolute Deviation (MAD)9
Skewness0.1819770318
Sum79763
Variance206.1519129
MonotonicityNot monotonic
2022-06-02T11:52:45.675185image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17827
 
6.3%
19022
 
5.1%
18717
 
4.0%
19216
 
3.7%
18815
 
3.5%
17914
 
3.3%
19114
 
3.3%
17713
 
3.0%
20013
 
3.0%
18312
 
2.8%
Other values (57)265
61.9%
ValueCountFrequency (%)
1431
 
0.2%
1441
 
0.2%
1501
 
0.2%
1532
0.5%
1541
 
0.2%
1552
0.5%
1562
0.5%
1582
0.5%
1593
0.7%
1601
 
0.2%
ValueCountFrequency (%)
2381
 
0.2%
2301
 
0.2%
2271
 
0.2%
2241
 
0.2%
2222
 
0.5%
2212
 
0.5%
2193
0.7%
2183
0.7%
2152
 
0.5%
2127
1.6%

Interactions

2022-06-02T11:52:38.624807image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:28.177275image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:29.566353image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:30.817658image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:32.416636image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:34.242258image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:35.745811image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:37.179488image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:38.864003image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:28.372242image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:29.741632image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:30.989307image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:32.645126image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:34.421628image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:35.953776image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:37.351027image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:39.031987image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:28.557366image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:29.883325image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:31.135384image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:32.805232image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:34.574888image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:36.156910image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:37.521359image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:39.186694image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:28.710585image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:30.021915image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:31.295284image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:32.962337image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:34.737572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:36.355044image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:37.702706image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:39.361814image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:28.869118image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:30.184860image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:31.496511image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:33.114580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:34.883729image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:36.512633image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:37.845457image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:39.555941image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:29.026225image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:30.340491image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:31.803715image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:33.274396image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:35.063875image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:36.678545image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:38.037740image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:39.709646image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:29.184903image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:30.496899image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:32.009855image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:33.458219image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:35.234990image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:36.840201image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:38.223692image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:39.893761image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:29.383261image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:30.683456image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:32.219993image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:33.614046image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:35.521663image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:37.012315image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-06-02T11:52:38.424827image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-06-02T11:52:45.899544image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-02T11:52:46.392455image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-02T11:52:46.595590image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-02T11:52:46.810554image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-06-02T11:52:46.983634image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-02T11:52:40.350312image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-02T11:52:40.811760image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-06-02T11:52:40.969266image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

MakeModelTypeOriginDriveTrainMSRPInvoiceEngineSizeCylindersHorsepowerMPG_CityMPG_HighwayWeightWheelbaseLength
0AcuraMDXSUVAsiaAll$36,945$33,3373.56.026517234451106189
1AcuraRSX Type S 2drSedanAsiaFront$23,820$21,7612.04.020024312778101172
2AcuraTSX 4drSedanAsiaFront$26,990$24,6472.44.020022293230105183
3AcuraTL 4drSedanAsiaFront$33,195$30,2993.26.027020283575108186
4Acura3.5 RL 4drSedanAsiaFront$43,755$39,0143.56.022518243880115197
5Acura3.5 RL w/Navigation 4drSedanAsiaFront$46,100$41,1003.56.022518243893115197
6AcuraNSX coupe 2dr manual SSportsAsiaRear$89,765$79,9783.26.029017243153100174
7AudiA4 1.8T 4drSedanEuropeFront$25,940$23,5081.84.017022313252104179
8AudiA41.8T convertible 2drSedanEuropeFront$35,940$32,5061.84.017023303638105180
9AudiA4 3.0 4drSedanEuropeFront$31,840$28,8463.06.022020283462104179

Last rows

MakeModelTypeOriginDriveTrainMSRPInvoiceEngineSizeCylindersHorsepowerMPG_CityMPG_HighwayWeightWheelbaseLength
418VolvoS60 2.5 4drSedanEuropeAll$31,745$29,9162.55.020820273903107180
419VolvoS60 T5 4drSedanEuropeFront$34,845$32,9022.35.024720283766107180
420VolvoS60 R 4drSedanEuropeAll$37,560$35,3822.55.030018253571107181
421VolvoS80 2.9 4drSedanEuropeFront$37,730$35,5422.96.020820283576110190
422VolvoS80 2.5T 4drSedanEuropeAll$37,885$35,6882.55.019420273691110190
423VolvoC70 LPT convertible 2drSedanEuropeFront$40,565$38,2032.45.019721283450105186
424VolvoC70 HPT convertible 2drSedanEuropeFront$42,565$40,0832.35.024220263450105186
425VolvoS80 T6 4drSedanEuropeFront$45,210$42,5732.96.026819263653110190
426VolvoV40WagonEuropeFront$26,135$24,6411.94.017022292822101180
427VolvoXC70WagonEuropeAll$35,145$33,1122.55.020820273823109186